- Millennium’s deputy head of quant strategies said new alt datasets are lacking in long-term alpha generation.
- New datasets focus on themes, such as data on Reddit traders.
- There hasn’t been a can’t-miss dataset to come out in years, he said.
If you can think of a dataset, it’s most likely out there – and if you can’t find it, then you aren’t looking hard enough.
At least that’s what Matthew Rothman – the deputy of quant strategies for Izzy Englander’s $52 billion Millennium – said at Tuesday’s Neudata digital conference.
Rothman, who has helped build up Millennium’s new quant platform with former Cubist head Ross Garon over the last 12 months, said the tidal wave of data has made some people forget data’s purpose in the investing game: alpha.
“We were almost like kids in a candy store” five years ago, when new, multi-use datasets were being introduced, Rothman said. Credit-card data, geolocation data, cell-phone data – all of it can be used across different industries and stocks.
The data reality though is there hasn't been a can't-miss dataset to come out in years, according to Rothman, who also teaches at MIT.

"It's been a while since I saw something where I was like 'wow, I've never seen something like this before,'" he said.
It's hurting alpha at the largest quant funds, which need strong signals that can be applied to many different securities, Rothman said. One of the ways Millennium's quant platform has protected against this is by recruiting "low-capacity, high-Sharpe" quant teams to run a multitude of diverse strategies instead of piling billions into bread-and-butter quant strategies.
With smaller teams focused on the more obscure corners of quant investing, Rothman said the platform can more easily find alpha.
The firm is still searching for the next multi-use dataset, with both internal teams and outside consultants like Neudata, which is a London-based data company. But Rothman has found the most important datasets to come out recently to be more thematic than secular, such as data on Reddit traders.
"It's not a cross-sectional signal the way credit-card data was," Rothman said.
He outlined four different subgenres of alternative data that deliver alpha: speed, granular, curated, and distinct. Alpha from speed and granular forms of alt data provide information quicker and more in-depth, respectively, than traditional data sources have, giving an investor an advantage.
Curated datasets involve combining multiple streams of information to create something new that no one else has - a clear form of alpha that also requires a lot of resources, such as data scientists. Distinct data though has been missing in the marketplace for years, Rothman said.
"It used to be a lot more common" to find "once-in-a-lifetime data that no one else had," but as alternative data's popularity grew, so did the number of people searching for the best info.
Because of this, many of the top quants have been building out bigger and bigger data science and engineering teams, so they can create more curated datasets out of the information they already have. Two Sigma for instance just hired a former senior engineer from Google to be its head of data engineering, and is planning to aggressively hire for his team.
Millennium is no different, and Rothman said he expects people to know Python and stats, but also hopes they have a "deep skepticism."
"We want people who aren't afraid to call hogwash on things, even if it's their own work," he said. While everyone is getting more advanced - using machine-learning programs and artificial intelligence to read earnings transcripts - Rothman said he interviews too many engineers who struggle with basics.
"Worry about if your data is clean" before embarking on a massive project, he said.
"What we are missing in data engineering and data science is common sense."